import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('fivethirtyeight')
import folium
import plotly.express as px
# Exporting data into the Notebook.
ev = pd.read_csv('Electric_Vehicle_Population_Data.csv')
ev
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | DOL Vehicle ID | Vehicle Location | Electric Utility | 2020 Census Tract | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5YJXCAE26J | Yakima | Yakima | WA | 98908.0 | 2018 | TESLA | MODEL X | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 238 | 0 | 14.0 | 141151601 | POINT (-120.56916 46.58514) | PACIFICORP | 5.307700e+10 |
| 1 | JHMZC5F37M | Kitsap | Poulsbo | WA | 98370.0 | 2021 | HONDA | CLARITY | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 47 | 0 | 23.0 | 171566447 | POINT (-122.64681 47.73689) | PUGET SOUND ENERGY INC | 5.303509e+10 |
| 2 | 5YJ3E1EB0K | King | Seattle | WA | 98199.0 | 2019 | TESLA | MODEL 3 | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 220 | 0 | 36.0 | 9426525 | POINT (-122.40092 47.65908) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) | 5.303301e+10 |
| 3 | 1N4AZ0CP5D | King | Seattle | WA | 98119.0 | 2013 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 75 | 0 | 36.0 | 211807760 | POINT (-122.3684 47.64586) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) | 5.303301e+10 |
| 4 | 5YJSA1E21H | Thurston | Lacey | WA | 98516.0 | 2017 | TESLA | MODEL S | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 210 | 0 | 22.0 | 185810306 | POINT (-122.75379 47.06316) | PUGET SOUND ENERGY INC | 5.306701e+10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 130438 | 7SAYGDEE6P | Pierce | Gig Harbor | WA | 98335.0 | 2023 | TESLA | MODEL Y | Battery Electric Vehicle (BEV) | Eligibility unknown as battery range has not b... | 0 | 0 | 26.0 | 231134102 | POINT (-122.58354539999999 47.32344880000005) | BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM... | 5.305307e+10 |
| 130439 | 1N4BZ1CV7N | Pierce | Tacoma | WA | 98408.0 | 2022 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Eligibility unknown as battery range has not b... | 0 | 0 | 29.0 | 185810943 | POINT (-122.43810499999995 47.203220000000044) | BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM... | 5.305306e+10 |
| 130440 | 5YJYGDEE8M | King | Seattle | WA | 98109.0 | 2021 | TESLA | MODEL Y | Battery Electric Vehicle (BEV) | Eligibility unknown as battery range has not b... | 0 | 0 | 36.0 | 176542418 | POINT (-122.35022 47.63824) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) | 5.303301e+10 |
| 130441 | 5YJXCBE22L | Island | Camano Island | WA | 98282.0 | 2020 | TESLA | MODEL X | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 293 | 0 | 10.0 | 102834938 | POINT (-122.40049 48.23986) | BONNEVILLE POWER ADMINISTRATION||PUD 1 OF SNOH... | 5.302997e+10 |
| 130442 | 5YJ3E1EA5M | Pierce | Puyallup | WA | 98375.0 | 2021 | TESLA | MODEL 3 | Battery Electric Vehicle (BEV) | Eligibility unknown as battery range has not b... | 0 | 0 | 2.0 | 180473639 | POINT (-122.30116 47.1165) | BONNEVILLE POWER ADMINISTRATION||CITY OF TACOM... | 5.305307e+10 |
130443 rows × 17 columns
for i in ev.columns:
print(i)
VIN (1-10) County City State Postal Code Model Year Make Model Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility Electric Range Base MSRP Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
ev.shape
(130443, 17)
# Total Information about the Dataset
ev_info = pd.DataFrame(columns = ['Name of col.', 'no. of Null Val.', 'DataType', 'Unique Value'])
for i in range(0, len(ev.columns)):
ev_info.loc[i] = [ev.columns[i],
ev[ev.columns[i]].isnull().sum(),
ev[ev.columns[i]].dtype,
ev[ev.columns[i]].nunique()]
ev_info
| Name of col. | no. of Null Val. | DataType | Unique Value | |
|---|---|---|---|---|
| 0 | VIN (1-10) | 0 | object | 8827 |
| 1 | County | 3 | object | 166 |
| 2 | City | 3 | object | 656 |
| 3 | State | 0 | object | 46 |
| 4 | Postal Code | 3 | float64 | 787 |
| 5 | Model Year | 0 | int64 | 21 |
| 6 | Make | 0 | object | 35 |
| 7 | Model | 222 | object | 121 |
| 8 | Electric Vehicle Type | 0 | object | 2 |
| 9 | Clean Alternative Fuel Vehicle (CAFV) Eligibility | 0 | object | 3 |
| 10 | Electric Range | 0 | int64 | 102 |
| 11 | Base MSRP | 0 | int64 | 31 |
| 12 | Legislative District | 305 | float64 | 49 |
| 13 | DOL Vehicle ID | 0 | int64 | 130443 |
| 14 | Vehicle Location | 33 | object | 773 |
| 15 | Electric Utility | 3 | object | 75 |
| 16 | 2020 Census Tract | 3 | float64 | 2042 |
ev.drop(columns = ['2020 Census Tract', 'DOL Vehicle ID'], inplace= True)
ev[ev['City'].isna()]
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | Vehicle Location | Electric Utility | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 102 | 1N4AZ0CP1D | NaN | NaN | AP | NaN | 2013 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 75 | 0 | NaN | NaN | NaN |
| 103 | 5YJ3E1EA5K | NaN | NaN | BC | NaN | 2019 | TESLA | MODEL 3 | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 220 | 0 | NaN | NaN | NaN |
| 127670 | 5YJRE11B48 | NaN | NaN | BC | NaN | 2008 | TESLA | ROADSTER | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 220 | 98950 | NaN | NaN | NaN |
ev.dropna(subset = ['City'], inplace=True)
ev[ev['Model'].isna()]['VIN (1-10)'].apply(lambda x: x[2 : 6]).unique()
# applies a lambda function to each value in the selected column. The lambda function takes a single argument,
# x, and returns the substring of x starting at index 2 and ending at index 6.
array(['4ED3'], dtype=object)
# Analyzing the EV with same model code
subset_4ED3 = ev[ev['VIN (1-10)'].str.contains('4ED3')]
subset_4ED3[['Model Year', 'Make', 'Model', 'Electric Vehicle Type']]
| Model Year | Make | Model | Electric Vehicle Type | |
|---|---|---|---|---|
| 533 | 2023 | VOLVO | NaN | Battery Electric Vehicle (BEV) |
| 578 | 2022 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 1023 | 2022 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 1070 | 2021 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 1167 | 2022 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| ... | ... | ... | ... | ... |
| 129509 | 2021 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 129597 | 2023 | VOLVO | C40 | Battery Electric Vehicle (BEV) |
| 129736 | 2022 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 130238 | 2021 | VOLVO | XC40 | Battery Electric Vehicle (BEV) |
| 130379 | 2023 | VOLVO | C40 | Battery Electric Vehicle (BEV) |
963 rows × 4 columns
#The missing values are of Volvo which have launches two models XC40 and C40
print(subset_4ED3['Make'].unique())
print(subset_4ED3['Model'].unique())
['VOLVO'] [nan 'XC40' 'C40']
#All the missing values belong to the year 2023
subset_4ED3[subset_4ED3['Model'].isna()]['Model Year'].value_counts()
2023 222 Name: Model Year, dtype: int64
#Analyse the year associated with the model variant and impute accordingly
subset_4ED3.groupby(['Model'])['Model Year'].value_counts()
Model Model Year
C40 2022 128
2023 118
XC40 2021 247
2022 242
2023 6
Name: Model Year, dtype: int64
ev.fillna({'Model':'C40'},inplace = True)
ev.head(3)
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | Vehicle Location | Electric Utility | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5YJXCAE26J | Yakima | Yakima | WA | 98908.0 | 2018 | TESLA | MODEL X | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 238 | 0 | 14.0 | POINT (-120.56916 46.58514) | PACIFICORP |
| 1 | JHMZC5F37M | Kitsap | Poulsbo | WA | 98370.0 | 2021 | HONDA | CLARITY | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 47 | 0 | 23.0 | POINT (-122.64681 47.73689) | PUGET SOUND ENERGY INC |
| 2 | 5YJ3E1EB0K | King | Seattle | WA | 98199.0 | 2019 | TESLA | MODEL 3 | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 220 | 0 | 36.0 | POINT (-122.40092 47.65908) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) |
ev['Legislative District'].mean()
29.577025926324364
ev['Legislative District'] = ev['Legislative District'].fillna(29.0)
ev['Vehicle Location'].mode()
0 POINT (-122.13158 47.67858) Name: Vehicle Location, dtype: object
ev['Vehicle Location'] = ev['Vehicle Location'].fillna('POINT (-122.13158 47.67858)')
ev.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 130440 entries, 0 to 130442 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 VIN (1-10) 130440 non-null object 1 County 130440 non-null object 2 City 130440 non-null object 3 State 130440 non-null object 4 Postal Code 130440 non-null float64 5 Model Year 130440 non-null int64 6 Make 130440 non-null object 7 Model 130440 non-null object 8 Electric Vehicle Type 130440 non-null object 9 Clean Alternative Fuel Vehicle (CAFV) Eligibility 130440 non-null object 10 Electric Range 130440 non-null int64 11 Base MSRP 130440 non-null int64 12 Legislative District 130440 non-null float64 13 Vehicle Location 130440 non-null object 14 Electric Utility 130440 non-null object dtypes: float64(2), int64(3), object(10) memory usage: 15.9+ MB
ev.head()
| VIN (1-10) | County | City | State | Postal Code | Model Year | Make | Model | Electric Vehicle Type | Clean Alternative Fuel Vehicle (CAFV) Eligibility | Electric Range | Base MSRP | Legislative District | DOL Vehicle ID | Vehicle Location | Electric Utility | 2020 Census Tract | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 5YJXCAE26J | Yakima | Yakima | WA | 98908.0 | 2018 | TESLA | MODEL X | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 238 | 0 | 14.0 | 141151601 | POINT (-120.56916 46.58514) | PACIFICORP | 5.307700e+10 |
| 1 | JHMZC5F37M | Kitsap | Poulsbo | WA | 98370.0 | 2021 | HONDA | CLARITY | Plug-in Hybrid Electric Vehicle (PHEV) | Clean Alternative Fuel Vehicle Eligible | 47 | 0 | 23.0 | 171566447 | POINT (-122.64681 47.73689) | PUGET SOUND ENERGY INC | 5.303509e+10 |
| 2 | 5YJ3E1EB0K | King | Seattle | WA | 98199.0 | 2019 | TESLA | MODEL 3 | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 220 | 0 | 36.0 | 9426525 | POINT (-122.40092 47.65908) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) | 5.303301e+10 |
| 3 | 1N4AZ0CP5D | King | Seattle | WA | 98119.0 | 2013 | NISSAN | LEAF | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 75 | 0 | 36.0 | 211807760 | POINT (-122.3684 47.64586) | CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA) | 5.303301e+10 |
| 4 | 5YJSA1E21H | Thurston | Lacey | WA | 98516.0 | 2017 | TESLA | MODEL S | Battery Electric Vehicle (BEV) | Clean Alternative Fuel Vehicle Eligible | 210 | 0 | 22.0 | 185810306 | POINT (-122.75379 47.06316) | PUGET SOUND ENERGY INC | 5.306701e+10 |
# Graphical representation of "Model Year" using Histogram.
plt.hist(x= ev['Model Year'], edgecolor='white', bins= 15, color='g')
plt.title('Distribution of Model Year')
plt.xlabel('Year')
plt.ylabel('Frequency')
plt.tight_layout()
# The below code will group the count of the intersection Models and Make together.
car_popularity = ev.value_counts(['Make', 'Model'],ascending = True)
car_popularity
Make Model
CHEVROLET S-10 PICKUP 1
PORSCHE 918 1
BENTLEY FLYING SPUR 1
BMW 745LE 2
BENTLEY BENTAYGA 2
...
CHEVROLET BOLT EV 5335
TESLA MODEL S 7399
NISSAN LEAF 12960
TESLA MODEL Y 22078
MODEL 3 25310
Length: 121, dtype: int64
car_popularity.tail(10).plot(kind='barh', title='Car Model Polularity', figsize = (20,10),
ylabel ='Total Number of Cars',
xlabel ='Top 10 Popular Models', fontsize = 15)
<AxesSubplot:title={'center':'Car Model Polularity'}, ylabel='Top 10 Popular Models'>
top_companies = ev.value_counts(['Make'] ,ascending = True)
top_companies.tail(10).plot(kind= 'bar', title = 'Top EV Companies', figsize = (20, 10), fontsize = 15,
xlabel = 'Top 10 EV Companies',
ylabel = 'Total No of cars',)
<AxesSubplot:title={'center':'Top EV Companies'}, xlabel='Top 10 EV Companies', ylabel='Total No of cars'>
model_year_data = ev['Model Year'].value_counts().sort_index(ascending = False).head(11)
decade_change = model_year_data.pct_change(periods=-1) * 100
# Plotting percentage change. (Except 2023 as the year is ongoing)
plt.plot(decade_change[1:], color= 'g', marker = 'o', markerfacecolor = 'k')
plt.axhline(y=0, color='red', ls= '--')
plt.title('% change over last decade')
plt.xlabel('Year')
plt.ylabel('% Change')
Text(0, 0.5, '% Change')
# Calculating missing (0) values % for 'Electric Range'.
(ev['Electric Range'] == 0).sum()/len(ev)*100
40.973627721557804
ev.rename(columns={'Base MSRP': 'Base_MSRP'}, inplace=True)
# Calculating missing (0) values % for 'Base MSRP'.
(ev['Base_MSRP'] == 0).sum()/len(ev)*100
97.37969841233335
# Filtering out the missing data.
new_MSRP = ev[ev['Base_MSRP']>0]
# Calculating the mean price of each Make.
ev2 = pd.DataFrame(new_MSRP.groupby(['Make'])['Base_MSRP'].mean())
ev2.sort_values(by= 'Base_MSRP', ascending= False, inplace= True)
plt.figure(figsize=(20,10))
sns.barplot( y=ev2.index, x=ev2.Base_MSRP,palette='Greens_r')
plt.title("Mean Base MSRP for each Make")
Text(0.5, 1.0, 'Mean Base MSRP for each Make')
# Summary for counting values of each 'model' in "Treemap".
summary = pd.DataFrame(new_MSRP.groupby(['Make'])['Model'].value_counts())
summary.columns = ['count']
summary = summary.reset_index()
px.treemap(data_frame = summary, path=['Make' , 'Model'], values= 'count',
color_discrete_sequence = px.colors.sequential.Greens_r,
title = 'count of Models of each Make.')
# Pie-chart for type of EV.
fig, ax = plt.subplots()
sizes = ev['Electric Vehicle Type'].value_counts()
labels = 'BEV', 'PHEV'
ax.pie(sizes,labels=labels,autopct='%1.1f%%',shadow=True, startangle=90,wedgeprops={'edgecolor':'black'},
colors=['#1e9570','#F5F5F5'])
ax.axis('equal')
ax.set_title('Type of EV')
Text(0.5, 1.0, 'Type of EV')
# Pie-chart for CAFV Eligibility.
fig, ax = plt.subplots()
sizes = ev['Clean Alternative Fuel Vehicle (CAFV) Eligibility'].value_counts()
labels = 'Eligible', 'Unknown', 'Not Eligible'
ax.pie(sizes,labels=labels,autopct='%1.1f%%',shadow=True, startangle=90,wedgeprops={'edgecolor':'black'},
colors=['#1e9570','#F5F5F5','#E84855'])
ax.axis('equal')
ax.set_title('CAFV Eligibility.')
Text(0.5, 1.0, 'CAFV Eligibility.')
Insights:
1. There are two types of EVs with Battery Electric Vehicle being the dominant one.
2. BEV have also far superior range averaging around 200 miles while PHEV have range of only 20-50 miles.
3. We here do not have enough data to prove, but according to present data, PHEV are much cheaper than BEV.
4. Barely 12% of the vehicles do not have clean alternative feul eligibility which indicates a greener initiative.
Story :
The Dataset contains 130443 rows and 17 columns. The columns contains geographical as well as various model
details about the Vehicles registered under Washingtion DOL DepartmentDOL Vehicle ID is the unique identifier for our data
VIN (1-10) column is not unique which might seem counterintutive but in reality it is truncated and the first 9 digits
explain world manufacturer info and vehicle descriptor section and the 10th digit encodes the model year.
We also notice few missing values. The model and legislative distict columns in particular have quite a few of them
and we will look to impute it or drop later. We also drop DOL Vehicle ID and 2020 Census Tract column as it is not relevant
to our anaylsis.
'Model 3' of 'Tesla car comapany' has sold most of the cars in electric vehicle market followed by Nissan,
Chevrolet, Ford and BMW.
The EV Industry has been drastically rising over the last decade.The percent change over the decade shows
a consistent rise, except the year 2019 which saw a sharp decline but a quick recovery afterwards.
Note that the % change graph does not show data for the year 2023 as the year is still going on but can
expect the trend to going on upwards.
Porche is most expensive of all brands. We may use the treemap below to understand the dispersion of different
model counts because the average MSRP greatly depends on the number of instances and the number of models each Make has.
Since this analysis is based on the 3% of actual data, it is impractical to set the actuall price. (The analysis is to
show the way to visualize this similar kind of problems.) There are two types of EVs with Battery Electric Vehicle being
the dominant one. BEV have also far superior range averaging around 200 miles while PHEV have range of only 20-50 miles.
We here do not have enough data to prove, but according to present data, PHEV are much cheaper than BEV. Barely 12% of
the vehicles do not have clean alternative feul eligibility which indicates a greener initiative.